Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

نویسندگان

  • Yan Gao
  • Ming Yang
  • Alok N. Choudhary
چکیده

Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Self-Exploring Discriminating Features

Many visual learning tasks are usually confronted by some common diiculties. One of them is the lack of supervised information, due to the fact that labeling could be tedious, expensive or even impossible. Such scenario makes it challenging to learn object concepts from images. This problem could be alleviated by taking a hybrid of labeled and unlabeled training data for learning. Since the unl...

متن کامل

Towards Self-Exploring Discriminating Features for Visual Learning

Many visual learning tasks are usually confronted by some common difficulties. One of them is the lack of supervised information, due to the fact that labeling could be tedious, expensive or even impossible. Another difficulty is the high dimensionality of the visual data. Fortunately, these difficulties could be alleviated by using a hybrid of labeled and unlabeled training data for learning. ...

متن کامل

Semi-supervised logistic discrimination via regularized Gaussian basis expansions

The problem of constructing classification methods based on both classified and unclassified data sets is considered for analyzing data with complex structures. We introduce a semi-supervised logistic discriminant model with Gaussian basis expansions. Unknown parameters included in the logistic model are estimated by regularization method along with the technique of EM algorithm. For selection ...

متن کامل

Self-Supervised Learning for Object Recognition based on Kernel Discriminant-EM Algorithm

In Proc. of IEEE Int’l Conf. on Computer Vision, Vancouver, Canada, 2001 It is often tedious and expensive to label large training data sets for learning-based object recognition systems. This problem could be alleviated by selfsupervised learning techniques, which take a hybrid of labeled and unlabeled training data to learn classifiers. Discriminant-EM (D-EM) proposed a framework for such tas...

متن کامل

Training SpamAssassin with Active Semi-supervised Learning

Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009